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Abstract 

The distribution semantics is one of the most prominent approaches for the combination 
of logic programming and probability theory. Many languages follow this semantics, such 
as Independent Choice Logic, PRISM, pD, Logic Programs with Annotated Disjunctions 
(LPADs) and ProbLog. 

When a program contains functions symbols, the distribution semantics is well-defined 
only if the set of explanations for a query is finite and so is each explanation. Well- 
definedness is usually either explicitly imposed or is achieved by severely limiting the class 
of allowed programs. In this paper we identify a larger class of programs for which the 
semantics is well-defined together with an efficient procedure for computing the probability 
of queries. Since LPADs ofi'er the most general syntax, we present our results for them, 
but our results are applicable to all languages under the distribution semantics. 

We present the algorithm "Probabilistic Inference with Tabling and Answer subsump- 
tion" (PITA) that computes the probability of queries by transforming a probabilistic 
program into a normal program and then applying SLG resolution with answer subsump- 
tion. PITA has been implemented in XSB and tested on six domains: two with function 
symbols and four without. The execution times are compared with those of ProbLog, 
cplint and CVE. PITA was almost always able to solve larger problems in a shorter time, 
on domains with and without function symbols. 

KEYWORDS: Probabilistic Logic Programming, Tabling, Answer Subsumption, Logic 
Programs with Annotated Disjunction, Program Transformation 
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1 Introduction 

Many real world domains can only be represented effectively if we are able to model 
uncertainty. Accordingly, there has been an increased interest in logic languages 
representing probabilistic information, stemming in part from their successful use 
in Machine Learning. In particular, languages that follow the distribution semantics 
(jSato 1995P have received much attention in the last few years. In these languages 
a theory defines a probability distribution over logic programs, which is extended 
to a joint distribution over programs and queries. The probability of a query is then 
obtained by marginalizing out the programs. 

Examples of languages that follow the distribution semantics are Independent 
Choice Logic (|Poole 1997|) . PRISM ( |Sato and Kameya 1997D , pD (jFuhr 2000|) . Logic 
Programs with Annotated Disjunctions (LPADs) (jVennekens et al. 2004[) and ProbLog 
()De Raedt et al. 2007p . All these languages have the same expressive power as a the- 
ory in one language can be translated into another (jVennekens and Verbaeten 2003( 
IDe Raedt et al. 2008[) . LPADs offer the most general syntax as the constructs of all 
the other languages can be directly encoded in LPADs. 

When programs contain functions symbols, the distribution semantics has to be 
defined in a slightly different way: as proposed in (jSato 1995^ and (jPoole 1997| : the 
probability of a query is defined with reference to a covering set of explanations 
for the query. For the semantics to be well-defined, both the covering set and each 
explanation it contains must be finite. To ensure that the semantics is well-defined, 
(jPoole 1997| requires programs to be acyclic, while ( [Sato and Kameya 1997[ ) di- 
rectly imposes the condition that queries must have a finite covering set of finite 
explanations. 

Since acyclicity is a strong requirement ruling out many interesting programs, in 
this paper we propose a looser requirement to ensure the well-definedness of the 
semantics. We introduce a definition of bounded term-size programs and queries, 
which are based on a characterization of the Weil-Founded Semantics in terms of an 
iterated fixpoint ( [Przymusinski 1989[ ). A bounded term-size program is such that 
in each iteration of the fixpoint the size of true atoms does not grow indefinitely. 
A bounded term-size query is such that the portion of the program relevant to 
the query is bounded term-size. We show that if a query is bounded term-size, 
then it has a finite set of finite explanations that are covering, so the semantics is 
well-defined. 

We also present the algorithm "Probabilistic Inference with Tabling and An- 
swer subsumption" (PITA) that builds explanations for every subgoal encountered 
during a derivation of a query. The explanations are compactly represented using 
Binary Decision Diagrams (BDDs) that also allow an efficient computation of the 
probability. Specifically, PITA transforms the input LPAD into a normal logic pro- 
gram in which the subgoals have an extra argument storing a BDD that represents 
the explanations for its answers. As its name implies, PITA uses tabling to store 
explanations for a goal. Tabling has already been shown useful for probabilistic 
logic programming in ( [Kameya and Sato 2000| [Riguzzi 2008[ [Kimmig et al. 20091 
IMantadelis and Janssens 2010[|Riguzzi and Swift 2011[ ). However, PITA is novel in 
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its exploitation of a tabling feature called answer subsumption to combine expla- 
nations coming from different clauses. 

PITA draws inspiration from (jPe Raedt et al. 2007^ , which first proposed to use 
BDDs for computing the probability of queries for the ProbLog language, a min- 
imalistic probabilistic extension of Prolog; and from ( [Riguzzi 2007[ ) which applied 
BDDs to the more general LPAD syntax. Other approaches for reasoning on LPADs 
include ( [Riguzzi 2008[ ), where SLG resolution is extended by repeatedly branching 
on disjunctive clauses, and the CVE system (jMeert et al. 2009| which transforms 
LPADs into an equivalent Bayesian network and then performs inference on the 
network using the variable elimination algorithm. 

PITA was tested on a number of datasets, both with and without function sym- 
bols, in order to evaluate its efficiency. The execution times of PITA were com- 
pared with those of cplint ( [Riguzzi 20071 ), CVE ([Meert et al. 2009|) and ProbLog 
( [Kimmig et al. 2011[ ). PITA was able to solve successfully more complex queries 
than the other algorithms in most cases and it was also almost always faster both 
on datasets with and without function symbols. 

The paper is organized as follows. Section [2| illustrates the syntax and semantics 
of LPADs over finite universes. Section [3| discusses the semantics of LPADs with 
function symbols. Section [4| defines dynamic stratification for LPADs, provides con- 
ditions for the well-definedness of the LPAD semantics with function symbols, and 
discusses related work on termination of normal programs. Section [5] gives an intro- 
duction to BDDs. Section [HI briefly recalls tabling and answer subsumption. Section 
[7| presents PITA and Section [5| shows its correctness. Section [HI discusses related 
work. Section [TU[ describes the experiments and Section [TI] discusses the results and 
presents directions for future works. 

2 The Distribution Semantics for Function-free Programs 

In this section we illustrate the distribution semantics for function-free program 
using LPADs as the prototype of the languages following this semantics. 

A Logic Program with Annotated Disjunctions (jVennekens et al. 2004P consists 
of a finite set of annotated disjunctive clauses of the form 

TJi : ai V . . . V i7„ : a„ ^ Li, . . . , L^. 

In such a clause Hi, . . . Hn are logical atoms, . . . , B„i logical literals, and ai, 
. . . , a„ real numbers in the interval [0, 1] such that Yl^=i '^j — 1- The term Hi : 
ai V . . . V Hn : a„ is called the head and Li, . . . , L^ is called the body. Note that 
if 77, = 1 and ai = 1 a clause corresponds to a normal program clause, also called 
a nan- disjunctive clause. If X]j=i '^j ^^"^ head of the clause implicitly contains 
an extra atom null that does not appear in the body of any clause and whose 
annotation is 1 — X]j=i '^j- ^'^^ ^ clause C, we define head{C) as {{Hi : ai)\l < i < 
n} if Yll=i = 1; ^iid as {{Hi : ai)\l < i < n}L) {{null : 1 — X^iLi "^01 otherwise. 
Moreover, we define body{C) as {Li\l < i < m}, Hi{C) as Hi and ai{C) as ai. 

If the LPAD is ground, a clause represents a probabilistic choice between the 
non-disjunctive clauses obtained by selecting only one atom in the head. As usual. 
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if the LPAD T is not ground, T is assigned a meaning by computing its grounding, 
ground(T) . 

By choosing a head atom for each ground clause of an LPAD we get a nor- 
mal logic program called a world of the LPAD (an instance of the LPAD in 
(jVennekens et al. 2004[) ). A probability distribution is defined over the space of 
worlds by assuming independence between the choices made for each clause. 

More specifically, an atomic choice is a triple (C, 0, i) where C £ T, is a minimal 
substitution that grounds C and i € {1^ . . . ,\head{C)\}. {C,9,i) means that, for 
the ground clause CO, the head Hi{C) was chosen. A set of atomic choices k, is 
consistent if {C,9,i) € k, {C,9^j) £ k ^ i ~ j , i.e., only one head is selected for 
a ground clause. A composite choice k is a consistent set of atomic choices. The 
probability P{k) of a composite choice n is the product of the probabilities of the 
individual atomic choices, i.e. P(k) = Y\(c 9 i)eK^ii^)- 

A selection cr is a composite choice that, for each clause C9 in ground{T), con- 
tains an atomic choice (C, 0, i) in a. Since T does not contain function symbols, 
ground{T) is finite and so is each a. We denote the set of all selections cr of a pro- 
gram T by St- A selection cr identifies a normal logic program called a world of 
T, defined as: w„ ^ {{Hi{C)e ^ hody{C))e\{C , 6*, i) G cr}. Wt denotes the set of aU 
the worlds of T . Since selections are composite choices, we can assign a probability 
to worlds: P{w^) = P(cr) = Y{{c,e,^)e•y(^^{'^)■ 

Throughout this paper, we consider only sound LPADs, in which every world has 
a total model according to the Well- Founded Semantics (WFS) (I Van Gelder et al. 199ip . 
In this way, uncertainty is modeled only by means of the disjunctions in the head 
and not by the semantics of negation. Thus in the following, Wa \^ A means that 
the ground atom A is true in the well-founded model of the program w^. 

In order to define the probability of an atom A being true in an LPAD T, note that 
the probability distribution over possible worlds induces a probability distribution 
over Herbrand interpretations by assuming P{I\w) = 1 if / is the well-founded 
model oiw {I = WFM{w)) and otherwise. We can thus compute the probability 
of an interpretation / as 

loeWr ujGVVt weV\)TJ=WFM{w) 

We can extend the probability distribution on interpretation to ground atoms by 
assuming P{a.j\I) = 1 if Aj belongs to / and otherwise, where Aj is a ground 
atom of the Herbrand base T-Lt and aj stands for Aj = true. Thus the probability 
of a ground atom Aj being true, according to an LPAD T can be obtained as 

P(a,) = ^P(a„/) =Y^P{a,\I)P{I) = E ^(^)- 

/ / ICHT,AjeI 

Alternatively, we can extend the probability distribution on programs to ground 
atoms by assuming P{aj\w) = 1 if Aj is true in w and otherwise. Thus the 

^ We sometimes abuse notation slightly by saying that an atom A is true in a world w to indicate 
that A is true in the (unique) well-founded model of w. 
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probability of Aj being true is 

P(a,)= ^ P{a,,w)^ P{a,\w)Piw)= ^ P{w). 

wEWt wEWt w£WT,w\=Aj 

The probability of Aj being false is defined similarly. 
Example 1 

Consider the dependency of sneezing on having the flu or hay fever: 

Ci — strong_sneezing(X) ■ 0.3 \/ moderate_sneezing{X) : 0.5 ■<— flu{X). 

C2 ~ strong_sneezing(X) : 0.2 \/ moderate^neezing{X) : 0.6 ■<— hay_fever{X). 

C3 — flu{david). 

Ci — hay_fever{david). 
This program models the fact that sneezing can be caused by flu or hay fever. The 
query moderate-sneezing{david) is true in 5 of the 9 worlds of the program and 
its probability of being true is 

PT{moderate_sneezing{david)) = 0.5-0.2+0.5-0.6+0.5-0.2+0.3-0.6+0. 2-0.6 = 0.8 

Even if we assumed independence between the choices for individual ground clauses, 
this does not represents a restriction, in the sense that this still allows to represent 
all the joint distributions of atoms of the Herbrand base that are representable with 
a Bayesian network over those variables. Details of the proof are omitted for lack 
of space. 



3 The Distribution Semantics for Programs with Function Symbols 

If a non-ground LPAD T contains function symbols, then the semantics given in 
the previous section is not well-defined. In this case, each world is the result of 
an infinite number of choices and the probability P(wa) is since it is given by the 
product of an infinite number of factors all smaller than 1. Thus, the probability 
of a formula is as well, since it is a sum of terms all equal to 0. The distribution 
semantics with function symbols was defined in (jSato 1995P and (|Poole 2000| ). Here 
we follow the approach of (jPoole 2000p . 

A composite choice k identifies a set of worlds that contains all the worlds 
associated to a selection that is a superset of k: i.e., Wk = {wo-|cr G St^o ^ We 
define the set of worlds identified by a set of composite choices K as ujk = Ukg/^ 

Given a ground atom A, we define the notion of explanation, covering set of com- 
posite choices and mutually incompatible set of explanations. A composite choice 
K is an explanation for A if A is true in every world of w^- In Example [l] the com- 
posite choice {(Ci, {X / david} ^ I)} is an explanation for strong _sneezing{david) . A 
set of composite choices K is covering with respect to A if every world in which 
A is true is such that Wa G ujk- In Example [U the set of composite choices 

Ki = {{{Ci, {X/david}, 2)}, {(C2, {X/david}, 2)}} (1) 

is covering for moderate-sneezing{david). Two composite choices ki and K2 are 
incompatible if their union is inconsistent, i.e., if there exists a clause C and a 
substitution 9 grounding C such that {C,9,j) S Ki,{C,9,k) G K2 and j ^ k. A 
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set K of composite choices is mutually incompatible if for all ki G if, K2 G K, ki ^ 
K2 Ki and K2 are incompatible. As illustration, the set of composite choices 

K2 = {{{Ci,{X/david},2),{C2,{X/david},l)}, 

{(Ci, {X/david}, 2), (C2, {X/david}, 3)}, (2) 
{{C2,{X/david},2)}} 

is mutually incompatible for the theory of Example [TJ ([Poole 2000p proved the 
following results 

• Given a finite set K of finite composite choices, there exists a finite set K' of 
mutually incompatible finite composite choices such that ujk = ^k' ■ 

• If Ki and K2 are both mutually incompatible finite sets of finite composite 
choices such that ujk^ = ujk2 then X^KSifi -P(^) = Skga's ^i^) 

Thus, we can define a unique probability measure ^ : Qt [0, 1] where fix is 
defined as the set of sets of worlds identified by finite sets of finite composite choices: 
fir = {^k\K is a finite set of finite composite choices}. It is easy to sec that Ht 
is an algebra over Wt- Then fi is defined by fi{uiK) = J2k£K' ^i^) where K' is a 
finite mutually incompatible set of finite composite choices such that ujn = (^k'- 
As is the case for ICL, (Wti^^TiM) is a probability space (Kolmogorov 1950). 

Definition 1 

The probability of a ground atom A is given by P{A) = ij,{{w\w £ Wt Aw\=A} 

If A has a finite set K of finite explanations such that K is covering then {w\w G 
WtAw \= A} =ujk and G Wt Aw ^ A} = ij{uik) so P(A) is well-defined. 

In the case of Example [1] A'2 shown in equation [5] is a finite covering set of finite 
explanations for moderate_sneezing{david) that is mutually incompatible, so 

P{moderate.sneezing{david)) = 0.5 • 0.2 + 0.5 • 0.2 + 0.6 = 0.8. 



4 Dynamic Stratification of LPADs 

One of the most important formulations of stratification is that of dynamic strat- 
ification. ( [Przymusinski 19891 ) shows that a program has a 2-valucd well-founded 
model iff it is dynamically stratified, so that it is the weakest notion of stratifica- 
tion that is consistent with the WFS. As presented in ( [Przymusinski 19*891 ), dynamic 
stratification computes strata via operators on 3-valued interpretations - pairs of 
the form {Tr;Fa), where Tr and Fa are subsets of the Hcrbrand base Hp of a 
normal program P. 

Definition 2 

For a normal program P, sets Tr and Fa of ground atoms, and a 3-valued inter- 
pretation / we define 

Truef (Tr) = {A\A is not true in /; and there is a clause B <— Li, in P, a 

ground substitution 9 such that A = B9 and for every 1 < i < n either Li9 is 
true in /, or Li9 G Tr}; 
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Falsef (Fa) = {A\A is not false in /; and for every clause B ^ Li, Ln in P and 
ground substitution 9 such that A = B9 there is some i {1 < i < n) such that 
Li9 is false in / or Li9 e Fa}. 

( [Przymusinski 19891 ) shows that Truef and Falsef arc both monotonic, and de- 
fines Tf as the least fixed point of Truef{il) and J-f as the greatest fixed point of 
Falsef{'Hp) H. In words, the operator 7/ extends the interpretation / to add the 
new atomic facts that can be derived from P knowing /; J-j adds the new nega- 
tions of atomic facts that can be shown false in P by knowing / (via the uncovering 
of unfounded sets). An iterated fixed point operator builds up dynamic strata by 
constructing successive partial interpretations as follows. 

Definition 3 {Iterated Fixed Point and Dynamic Strata) 
For a normal program P let 

WFMo - (0;0); 
WFM^+i = WFM^\J{Twfm^\:FwfaO\ 
WFMa = U/3<a WFMp, for limit ordinal a. 

Let WFM{P) denote the fixed point interpretation WFMs, where S is the smallest 
(countable) ordinal such that both sets TwfMs and J-wfMs are empty. We refer to 
6 as the depth of program P. The stratum of atom A, is the least ordinal (3 such that 
A G WFMp (where A may be either in the true or false component of WFMp). 

( [Przymusinski 19"89l ) shows that the iterated fixed point WFM{P) is in fact the 
well-founded model and that any undefined atoms of the well-founded model do 
not belong to any stratum - i.e. they are not added to WFMs for any ordinal S. 
Thus, a program is dynamically stratified if every atom belongs to a stratum. 

Dynamic stratification captures the order in which recursive components of a 
program must be evaluated. Because of this, dynamic stratification is useful for 
modeling operational aspects of program evaluation. Fixed-order dynamic stratifica- 
tion ( [Sagonas et al. 2000| , used in Section [71 models programs whose well-founded 
model can be evaluated using a fixed literal selection strategy. In this class, the 
definition of Falsef (Fa) in Definition [2l is replaced bjH: 

Falsef {F) = {A\A is not false in /; and for every clause B Li, in P and 

ground substitution 9 such that A = B9 there is some i (1 < i < n) such that 
Li9 is false in / or Li9 £ Fa, and for all j {1 < j < i — 1), Lj9 is true in I}. 

dSagonas et al. 20001 ) describes how fixed-order dynamic stratification captures those 
programs that a tabled evaluation can evaluate with a fixed literal selection strategy 
(i.e. without the SLG operations of simplification and delay). 



^ Below, wc will sometimes omit the program P in these operators when the context is clear. 
^ Without loss of generality, we assume throughout that the fixed literal selection strategy is 
left-to-right as in Prolog. 
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Example 2 

The following program has a 2-valued well-founded model and so is dynamically 
stratified, but does not belong to other stratification classes in the literature, such 
as local, modular, or weak stratification. 

s -IS. s ^p, -ig, ^r. 

p q, -^r, -IS. q <r~ r, ^p. r <^ p, -iq. 

p, q, and rail belong to stratum 0, while s belongs to stratum f . Note that the above 
program also meets the definition of fixed-order dynamically stratified as does the 
simple program 

p ^ ^ p. p. 

which is not locally, modularly, or weakly stratified. Fixed-order stratification is 
more general than local stratification, and than modular stratification (since mod- 
ular stratified programs can be decidably rearranged so that they have failing pre- 
fixes). It is neither more nor less general than weak stratification. 

As seen by the above examples, fixed-order dynamic stratification is a fairly weak 
property for a program to have. The above definitions of (fixed-order) dynamic 
stratification for normal programs can be straightforwardly adapted to LPADs - 
an LPAD T is (fixed-order) dynamically stratified if each w € Wt is (fixed-order) 
dynamically stratified. 



4-1 Conditions for Well-Definedness of the Distribution Semantics 

When a given LPAD T contains function symbols there are two reasons why the 
distribution semantics may not be well-defined for T. First, a world of T may not 
have a two- valued well-founded model; and second, "Ht may contain an atom that 
does not have a finite set of finite explanations that is covering (cf . Section ^ . As 
noted in Section [21 we consider only sound LPADs in this paper and in this section 
address the problem of determining whether T-Lt may contain a atom that does not 
have a finite set of finite explanations that is covering. 

As is usual in logic programming, we assume that a program P is defined over 
a language with a finite number of function and constant symbols. Given such an 
assumption, placing an upper bound on the size of terms in a derivation implies 
that the number of different terms in a derivation must be finite - and for certain 
methods of derivation, such as tabled or bottom-up evaluations, that the derivation 
itself is finite. 

To motivate our definitions, consider the normal program Tinj-: 

p(s(X)) ^ p(X). p(0). 

This program does not have a model with a finite number of true or undefined 
atoms, and accordingly, there is no upper limit on the size of atoms produced 
either in a bottom-up derivation of the program (e.g. using the fixed-point charac- 
terization of Definition |3]) , or in a top-down evaluation of the query p(Y). However, 
the superficially similar program, Tfin- 
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p(X) ^ p(f(X)). p(0). 

docs have a model with a finite number of true and undefined atoms. Of course, the 
model for the program docs not have a finite number of false atoms, but (default) 
false atoms are generally not explicitly represented in derivations. The model can 
in fact be produced by various derivation techniques, such as an alternating fixed 
point computation (jvan Gelder 1989| based on sets of true and of true or undefined 
atoms; or by tabling with term depth abstraction (jTamaki and Sato 1986^ . 

From the perspective of the distribution semantics consider r^„, the extension 
of Tfin with the clause 

q : 0.5 ^ p(X). 

and Tinf^ similar extension of Ti„f. Recall from Definition [1] that the probability 
of an atom A in an LPAD is defined as a probability measure that is constructed 
from finite sets of finite composite choices: accordingly, the distribution semantics 
for A is well-defined if and only if it has a finite set of finite explanations that is 
covering. In T^^ , q has such a finite set of finite explanations that is covering, and 
so its distribution semantics is well-defined. However, in T^^f, q does not have a 
finite set of finite explanations that is covering, and so the distribution semantics 
is not well-defined for q, even though every world of T^'^^ has a total well-founded 
model. 

The following definition captures these intuitions, basing the notion of bounded 
term-size on the preceding definition of dynamic stratification. 

Definition 4 {Bounded Term-size Programs) 

Let P be the ground instantiation of a normal program, and /, Tr C Hp . Then an 
application oiTruef{Tr) (Definition [2|) has the bounded term-size property if there 
is a integer L such that the size of every ground substitution 9 used to produce an 
atom in Truef (Tr) is less than L. P itself has the bounded term-size property if 
every application of Truef used to construct WFAI {P) has the bounded term-size 
property with the same bound L. Finally, an LPAD T has the bounded term-size 
property if each world of T has the bounded term-size property. 

Note that Tin/ does not have the bounded term-size property, but Tfin does. 
While determining whether a program P is bounded term-size is clearly undecidable 
in general, Tfin shows that ground{P) need not be finite if P is bounded term-size. 
However, the model of P may be characterized as foUowfl 

Theorem 1 

Let P be a normal program. Then WFM{P) has a finite number of true atoms iff 
P has the bounded term-size property. 

Theorem [T] gives a clear model-theoretic characterization of bounded term-size nor- 
mal programs: note that if ground{P) is infinite, then WFM{P) may have an 
infinite number of false or undefined atoms. In the context of LPADs, the bounded 
term-size property ensures the well-definedness of the distribution semantics. 

The proof of this and other theorems is given in the online Appendix to this paper. 
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Theorem 2 

Let T be a sound bounded term-size LPAD, and let A G Ht- Then A has a finite 
set of finite explanations that is covering. 

The proof of Theorem [2] is presented in the online Appendix; here we indicate the 
intuition behind the proof. First, we note that it is straightforward to show that 
since each world of an LPAD T has a finite number of true atoms by Theorem [1] 
explanations are finite. On the other hand, showing that a query has a finite covering 
set of explanations is less obvious, as T could have an infinite number of worlds. 
The proof addresses this by showing that T has a finite number of models, in turn 
shown by demonstrating the existence of a bound Lt on the maximal size of any 
true atom in any world of T. The existence of Lt is shown by contradiction by 
demonstrating that if no bound existed, a world could be constructed that was not 
bounded term-size. The idea is explained in the following example. 

Example 3 

Consider the program 

q:0.5Vp(/(^)) :0.5^p(X). p(0). 
This program has an infinite number of finite models, which consist of true atoms 

{g,p(0)}, {<z,p(0),p(/(0))}, {g,p(0),p(/(0)),p(/(/(0)))}, . . . 

depending on the selections made for instantiations of the first clause, and so no 
finite bound Lt exists for this program. However such a program also has a selection 
that gives rise to an infinite model 

{p(0),p(/(0)),p(/(/(0))),p(/(/(/(0)))), . . .} 

and so is not bounded term-size. 

Although bounded term-size programs have appealing properties, such programs 
can make only weak use of function symbols. For instance, a program containing 
the Prolog predicate member/2 would not be bounded term-size, although as any 
Prolog programmer knows, a query to member/2 will terminate whenever the second 
argument of the query is ground. We capture this intuition with bounded term-size 
queries. The definition of such queries relies on the notion of an atom dependency 
graph, whose definition we state for LPADs. 

Definition 5 [Atom Dependency Graph) 

Let T be a ground LPAD. Then the atom dependency graph of T is a graph (V, E) 
such that V = Ht and an edge (-ui, W2) G E iff there is a clause C GT such that 

1. {vi : ai) € head{C) and if 112 or -^V2 G body{C); or 

2. {vi : ai), {v2 : 02) E head{C). 

Definition [5] includes dependencies among atoms in the head of a disjunctive LPAD 
clause, similar to how dependencies are defined in disjunctive logic programs. Given 
a ground LPAD T, the atom dependency graph of T is used to bound the search 
space of a (relevant) derivation in a world of T under the WFS. 
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Definition 6 {Bounded Term-size Queries) 

Let T be a ground LPAD, and Q an atomic query to T (not necessarily ground). 
Then the atomic search space of Q consists of the union of all ground instantiations 
of Q in Ht together with all atoms reachable in the atom dependency graph of T 
from any ground instantiation of Q. Let 

Tq ~ {C\C e T and a head atom of C is in the atomic search space of Q} 

The query Q is bounded term-size if Tq is a bounded term-size program. 

The notion of a bounded-term size query will be used in Section [6] to characterize 
termination of the SLG tabling approach, and in Section |8] to characterize correct- 
ness and termination of our tabled PITA implementation. 

4-2 Comparisons of Termination Properties 

We next consider how the concepts of bounded term-size programs and queries 
relate to some other classes of programs for which termination has been studied. 
Since the definitions of the previous section are based on LPADs, and other work in 
the literature is often based on disjunctive logic programs, we restrict our attention 
to normal programs, for which the semantics coincide. 

(|Baselice et al. 2009^ studies the class of finitely recursive programs, which is 
a superset of finitary programs previously introduced into the literature by the 
authors. The paper first defines a dependency graph, which for normal programs 
is essentially the same as Definition [5] A finitely recursive normal program, then, 
is one for which in its atom dependency graph, only a finite number of vertices are 
reachable from any vertex. It is easy to see that neither bounded term-size programs 
nor finitely recursive programs are a subclass of each other. A program containing 
simply member/2 (and a constant) is finitely recursive, but is not bounded term- 
size. However, the program 

p(X) ^ p(f(X)). 

has bounded term-size, as does the program 

p(s(X)) ^ q(X),p(X). p(0). 

although neither is finitely recursive (for the last program, the failure of q(X) means 
that all applications of Truej have bounded term-size). However, note that for 
any program P that is finitely recursive, all ground atomic queries to P will have 
bounded term-size. Therefore, if P is finitely recursive, every ground atomic query 
to P will be bounded term-size, even if P itself isn't bounded term-size. 

Another recent work (jCalimeri et al. 2008^ defines the class finitely- ground pro- 
grams. We do not present its formalism here, but Corollary 1 of (jCalimeri et al. 2008^ 
states that if a program is finitely-ground, it will have a finite number of answer 
sets and each answer set will be finite (as represented by the set of true atoms in 
the model). By Theorem [T] of this paper, such a program will have bounded term- 
size, so that finitely-ground programs may be co-extensive with bounded term- 
size programs. On the other hand, (jCalimeri et al. 2d08|) notes that finitely-ground 
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programs and finitely recursive programs are incompatible. Non-range restricted 
programs are not finitely-ground, although they can be finitely recursive. As dis- 
cussed above, any ground atomic query to a finitely recursive program will have 
bounded term-size, so that finitely-ground programs must be a proper subclass of 
those programs for which all ground atomic queries have bounded term-size. 
To summarize for normal programs: 

• Finitely recursive and bounded term-size programs are incompatible, but 
finitely recursive programs are a proper subclass of those programs for which 
all ground atomic queries are bounded term-size. 

• Finitely-ground and bounded term-size programs appear to be co-extensive, 
but finitely- ground programs are a proper subclass of those programs for 
which all ground atomic queries are bounded term-size. 



5 Representing Explanations by Means of Decision Diagrams 

In order to represent explanations we can use Multivalued Decision Diagrams 
(MDDs) dThayse et al. 1978D . An MDD represents a function /(X) taking Boolean 
values on a set of multivalued variables X by means of a rooted graph that has one 
level for each variable. Each node N has one child for each possible value of the 
multivalued variable associated to N. The leaves store either or 1. Given values 
for all the variables X, an MDD can be used to compute the value of /(X) by 
traversing the graph starting from the root and returning the value associated to 
the leaf that is reached. 

Given a set of explanations K, we obtain a Boolean function fx in the following 
way. Each ground clause CO appearing in K is associated to a multivalued variable 
Xc$ with as many values as atoms in the head of C. In other words, each atomic 
choice (C, 0, i) is represented by the propositional equation Xce = i- Equations for a 
single explanation are conjoined and the conjunctions for the different explanations 
are disjoined. The set of explanations in Equation ([T]) can be represented by the 
function fx^ (X) = {Xc^{x/davtd} = 2) V {Xc^{x/david} = 2). The MDD shown in 



Figure 1(a) represents fxiO^)- 

Given a MDD M , wc can identify a set of explanations Km associated to M that 
is obtained by considering each path from the root to a 1 leaf as an explanation. It 
is easy to see that if K is a set of explanations and M is obtained from fx, K and 
Km represent the same set of worlds, i.e., that lok = ^Km- 

Note that Km is mutually incompatible because at each level we branch on a 
variable so that the explanations associated to the leaves that are descendants of a 
child of a node N are incompatible with those of any other children of N. 

By converting a set of explanations into a mutually incompatible set of expla- 
nations, MDDs allow the computation of /i(wif) (Section [3|) given any K. This is 
equivalent to computing the probability of a DNF formula which is #P-complete 
(?). Decision diagrams offer a practical solution for this problem and were shown 
better than other methods (|De Raedt et al. 2007^ . 

Decision diagrams can be built with various software packages that provide highly 
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Xc-i^{X/david} Xc^{X/david} 

(a) MDD. 




1 {X / david}2 Xc2{X/david}l Xc2 {X / david}2 

(b) BDD. 

Fig. 1. Decision diagrams for Example [TJ 



efficient implementation of Boolean operations. However, most packages are re- 
stricted to work with Binary Decision Diagrams, i.e., decision diagrams where all 
the variables are Boolean. To manipulate MDDs with a BDD package, we must 
represent multivalued variables by means of binary variables. Various options are 
possible, we found that the following, proposed in (jDe Raedt et al. 2008p . gives the 
best performance. For a variable X having n values, we use n — 1 Boolean variables 
Xi, . . . , Xn-i and we represent the equation X = i for i = 1, . . . n — 1 by means of 
the conjunction 

xi" A A . . . A Xi^i A X, 
and the equation X ~ n hy means of the conjunction 



Xi A X2 A . . . A 



The BDD representation of the function is given in Figure 1(b) The Boolean 
variables are associated with the following parameters: 

P{X,)^P{X = 1) 



P(X = 



n;:i(i-p(^.~i)) 



6 Tabling and Answer Subsumption 

The idea behind tabling is to maintain in a table both subgoals encountered in 
a query evaluation and answers to these subgoals. If a subgoal is encountered 
more than once, the evaluation reuses information from the table rather than re- 
performing resolution against program clauses. Although the idea is simple, it has 
important consequences. First, tabling ensures termination for a wide class of pro- 
grams, and it is often easier to reason about termination in programs using tabling 
than in basic Prolog. Second, tabling can be used to evaluate programs with nega- 
tion according to the WFS. Third, for queries to wide classes of programs, such 
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as datalog programs with negation, tabling can achieve the optimal complexity for 
query evaluation. And finally, tabling integrates closely with Prolog, so that Prolog's 
familiar programming environment can be used, and no other language is required 
to build complete systems. As a result, a number of Prologs now support tabling 
including XSB, YAP, B-Prolog, ALS, and Ciao. In these systems, a predicate p/n 
is evaluated using SLDNF by default: the predicate is made to use tabling by a 
declaration such as table p/n that is added by the user or compiler. 

This paper makes use of a tabling feature called answer subsumption. Most for- 
mulations of tabling add an answer A to a table for a subgoal S only if A is a not 
a variant (as a term) of any other answer for S. However, in many applications it 
may be useful to order answers according to a partial order or (upper semi-)lattice. 
As an example, consider the case of a lattice on the values of the second argument 
of a binary predicate p/2. Answer subsumption may be specified by a declaration 
such as table p(-,or/3 - zero/1), where zerojX is the bottom element of the lattice 
and or/3 is the join operation of the lattice. For example, if a table had an answer 
p(a, h\) and a new answer p(a, 62) were derived, the answer p(a, 61) is replaced by 
p{a.,b^), where 63 is the join of 61 and &2 obtained by calling or(6i, 621 ^3)- In the 
PITA algorithm for LPADs presented in Section [7] the last argument of an atom is 
used to store explanations for the atom in the form of BDDs and the or/3 operation 
is the logical disjunction of two explanations El- Answer subsumption over arbitrary 
upper semi-lattices is implemented in XSB for stratified programs (jSwift 1999bp . 

For formal results in this section and Section[8]we use SLG resolution (jChen and Warren 1996p . 
under the forest-of-trees representation (jSwift 1999aP : this framework is extended 
with answer subsumption in the proof of Theorem |4l However, first we present a 
theorem stating that bounded term-size queries (Definition |6]) to normal programs 
are amenable to top-down evaluation using tabling. Although SLG has been shown 
to finitely terminate for other notions of bounded term-size queries, the concept as 
presented in Definition |6] is based on a bottom-up fixed-point definition of WFS, 
and only bounds the size of substitutions used in Truef of Definition [21 but not 
of Falsef. In fact, to prove termination of SLG with respect to bounded term- 
size queries, SLG must be extended so that its New Subgoal operation performs 
what is called term- depth abstraction (ITamaki and Sato 1986p . explained informally 
as follows. An SLG evaluation can be formalized as a forest of trees in which each 
tree corresponds to a unique (up to variance) subgoal. The SLG New Subgoal 
operation checks to see if a given selected subgoal S is the root of any tree in the 
current forest. If not, then a new tree with root S is added to the forest. With- 
out term-depth abstraction, an SLG evaluation of the query p(a) and the program 
consisting of the single clause 

p(X) ^ p(J(X)). 

would create an infinite number of trees. However, if the New Subgoal operation 
uses term-depth abstraction, any subterm in S over a pre-specified maximal depth 

^ The logical disjunction 63 can be seen as subsuming fei and 62 over the partial order af impli- 
cation defined on prepositional formulas that represent explanations. 
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would be replaced by a new variable. For example, in the above program if the 
maximal depth were specified as 3, the subgoal p(f(f(f(a)))) would be rewritten 
to p(f(f(f(X)))) for the purposes of creating a new tree. The subgoal p(f(f(f(a)))) 
would consume any answer from the tree for p(f(f(f(X)))) where the binding for 
X unified with a. In this manner it can be ensured that only a finite number 
of trees were created in the forest. This fact, together with the size bound on the 
derivation of answers provided by Definition [6] ensures the following theorem, where 
a finitely terminating evaluation may terminate normally or may terminate through 
fioundering. 

Theorem 3 

Let P be fixed-order dynamically stratified normal program, and Q a bounded 
term-size query to P. Then there is an SLG evaluation of Q to P using term-depth 
abstraction that finitely terminates. 

By the discussion of Section 14.21 Theorem [3] shows that there is an SLG evalua- 
tion with term-depth abstraction will finitely terminate on any ground query to 
a finitely recursive (jBaselice et al. 2009[) or finitely-ground (jCalimeri et al. 2008P 
program that is fixed-order stratified H. While SLG itself is ideally complete for all 
normal programs, the PITA implementation is restricted to fixed-order stratified 
programs, so that Theorem [3] is used in the proof of the termination results of 
Section m 



7 Program Transformation 

The first step of the PITA algorithm is to apply a program transformation to an 
LPAD to create a normal program that contains calls for manipulating BDDs. In 
our implementation, these calls provide a Prolog interface to the CUDElJ C library 
and use the following prcdicateCl 

• init, end: for allocation and deallocation of a BDD manager, a data structure 
used to keep track of the memory for storing BDD nodes; 

• zero(-BDD), one(-BDD), and(+BDDl,+BDD2,-BDD0), or(+BDDl,+BDD2, 
-BDDO), not(+BDDI,-BDDO): Boolean operations between BDDs; 

• add_var(+N_Val,+Prohs,-Var): addition of a new multi-valued variable with 
N_Val values and parameters Probs] 

• equality (+Var, + Value, -BDD): BDD represents Var= Value, i.e. that the ran- 
dom variable Var is assigned Value in the BDD; 

• retjproh(+BDD,-P): returns the probability of the formula encoded by BDD. 

add_var(+N_Val,+Prohs,-Var) adds a new random variable associated to a new 
instantiation of a rule with N_Val head atoms and parameters list Probs. The 

® The proof of Theorem |3] relies on a delay- minimal evaluation of Q that does not produced any 
conditional answers - that is, an evaluation that does not explore the space of atoms that are 
undefined in WFM{P). 

^ http : //vlsi . Colorado . edu/~f abio/ 

* BDDs are represented in CUDD as pointers to their root node. 
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PITA transformation uses the auxiliary predicate get-var_n(+R,+S,+Probs,-Var) 
to wrap add_var/3 and avoid adding a new variable when one already exists for 
an instantiation. As shown below, a new fact var(R,S,Var) is asserted each time 
a new random variable is created, where R is an identifier for the LPAD clause, 
5* is a list of constants, one for each variable of the clause, and Var is an integer 
that identifies the random variable associated with clause R under the grounding 
represented by S. The auxiliary predicate has the following definition 

get_var_n{R, S, Probs, Var) 
{var{R, S, Var) — >■ true; 

length{Probs, L), addjvar{L, Probs, Var), assert{var{R, S, Var))). 

The PITA transformation applies to atoms, literals and clauses. If H is an atom, 
PITAh{H) is H with the variable BDD added as the last argument. If Aj is an 
atom, PITAsiAj) is Aj with the variable Bj added as the last argument. In either 
case for an atom A, BDD{PITA{A)) is the value of the last argument oiPITA{A), 
If Lj is negative literal ^Aj, PITAsiLj) is the conditional 

{PITA'g{Aj) not{BNj,Bj);one{Bj)), 

where PITA'^{Aj) is Aj with the variable BNj added as the last argument. In 
other words the input BDD, BNj, is negated if it exists; otherwise the BDD for the 
constant function 1 is returned. 

A non-disjunctive fact Cr = H is transformed into the clause 

PITA{Cr) = PITAh{H) ^ one{BDD). 

A disjunctive fact Cr — Hi : ai V . . . V iJ„ : «„. where the parameters sum to 1, is 
transformed into the set of clauses PITA{Cr^ 

PITA{Cr, 1) = PITAh{Hi) ^ get.var_n{r, [], [ai, . . . , an],Var), 

equality (Var, 1, BDD). 

PITA{Cr,n)= PITAh{H,,)^ get.mr_n(r, [], [ai, . . . , a„], Far), 

equality [Var, n, BDD). 

In the case where the parameters do not sum to one, the clause is first transformed 
into Hi : ai V . . . V iJ„ : V null : 1 — ai. and then into the clauses above, 
where the list of parameters is [ai, . . . , a„, 1 — ai] but the [n + l)-th clause (the 
one for null) is not generated. 

The definite clause C,- — H ^ Li, . . . , L„i. is transformed into the clause 

PITA{Cr) = PITAh{H) ^ oneiBBo), 

PITA b{Li), a?id{BBo , Bi , BBi ) , 

. . . , 

PITAb{L^), and{BB^^i,B^, BDD). 

^ The second argument of getjuarjn is the empty hst because a fact does not contain variables 
since the program is bounded term-size. 
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The disjunctive clause 

Cr = Hi : ai V . . . V i^n : a„ ^ I/i, . . . , Lm- 
where the parameters sum to 1, is transformed into the set of clauses PITA{Cr) 

PITA{Cr,l)= PITAh{Hi)^ oneiBBo), 

PITAb{Li), and{BBo, Bi^BBf), 

PITAB{Lra),and{BBm-uBra, BBra), 

getjvar_n(r, VC, [ai, . . . , a„], Var), 
equality{Var, 1, B), and{BB„,, B, BDD). 

PITA{Cr,n)= PITAh{H,,)^ one{BB^), 

PITAB{Li),and{BBo, Bi,BBi), 

■ • ■ 7 

P/TAb(L„), and{BB„,.i,B,r,, BB„,), 
getjvarjnir, VC, [ai, . . . , a„], Var), 
equality (Var, n, B), and{BB„i, B, BDD). 

where VC is a list containing each variable appearing in Cr. If the parameters do 
not sum to 1, the same technique used for disjunctive facts is used. 

Example 4 

Clause Ci from the LPAD of Example [T] is translated into 

strong-sneezing{X, BDD) one{B Bq) , fiu{X , Bf), and{BBo, Bi, BBi), 

get.var.n{l, [X], [0.3, 0.5, 0.2], Far), 
equality{Var, 1, B), and{BBi,B, BDD). 

moderate-sneezing{X, BDD) •<— one{BBo), flu{X, Bi), and{BBo, Bi, BBi), 

get.var.n{l, [X], [0.3, 0.5, 0.2], Far), 
equality{Var, 2, B), and{BBi,B, BDD). 

while clause C3 is translated into 

flu{david, BDD) ^ one{BDD). 

In order to answer queries, the goal prob(Goal,P) is used, which is defined by 

prob{Goal, P) <— init, retractall{var{_, _, _)), 

add_bdd_arg{Goal, BDD, GoalBDD), 
{calliGoalBDD) ret_prob{BDD, P); P = 0.0), 
end. 

where addJ>dd_arg{Goal, BDD, GoalBDD) implements PITAniGoal). Moreover, 
various predicates of the LPAD should be declared as tabled. For a predicate p/n, 
the declaration is table p(_l,...,_n,or/3-zero/l), which indicates that answer sub- 
sumption is used to form the disjunct of multiple explanations. At a minimum, the 
predicate of the goal and all the predicates appearing in negative literals should 
be tabled with answer subsumption. As shown in Section [TOl it is usually better to 
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table every predicate whose answers have muhiple explanations and are going to 
be reused often. 

8 Correctness of PITA Evaluation 

In this section we show a result regarding the PITA transformation and its tabled 
evaluation on bounded term-size queries: this result takes as a starting point the 
well-defincdncss result of Theorem [2] 

The main result of this section, Theorem |4l makes explicit mention of BDD data 
structures, which are considered to be ground terms for the purposes of formaliza- 
tion and are not specified further. Accordingly, the BDD operations used in the 
PITA transformation: and/3, or/3, not/2, one/1, zero/1, and equality/3, are all 
taken as (infinite) relations on terms, so that these predicates can be made part of 
a program's ground instantiation in the normal way. As a result, the ground instan- 
tiation of PITA{T) instantiates all variables in T with all BDD terms. Similarly, for 
the purposes of proving correctness, a ground program is assumed to be extended 
with the relation var(RuleName,[], Var) to associate a random variable with the 
identifier of each clause (see Appendix C for more details). Note that since Theo- 
rcm[4] assumes a bounded term-size query, the semantics is well-defined so the BDD 
and var/ 3 terms arc finite. In other words, the representation of each explanation 
of each atom are finite, and each atom has a finite covering set of explanations. 

Lemma [T] shows that the PITA transformation does not affect the property of a 
query being bounded term-size, a result that is used in the proof of Theorem ID 

Lemma 1 

Let T be an LPAD and Q a bounded term-size query to T. Then the query 
PITAh{Q) to PITA{T) has bounded term-size. 

Theorem [4] below states the correctness of the tabling implementation of PITA, 
since the BDD returned for a tabled query is the disjunction of a covering set of 
explanations for that query. The proof uses an extension of SLG evaluation that 
includes answer subsumption to collect explanations by disjoining BDDs, but that 
is restricted to the fixed-order dynamically stratified programs of Section |4l This 
formalism models the programs and implementation tested in Section 1101 

Theorem 4 {Correctness of PITA Evaluation) 

Let T be a fixed-order dynamically stratified LPAD and Q a ground bounded 
term-size atomic query. Then there is an SLG evaluation £ of PIT Ah (Q) against 
PITA{Tq), such that answer subsumption is declared on PIT Ah (Q) using BDD- 
disjunction where £ finitely terminates with an answer Ans for PIT Ah (Q) and 
BDD{Ans) represents a covering set of explanations for Q. 

9 Related Work 

(jMantadelis and Janssens 2010| presented an algorithm for answering queries to 
ProbLog programs that uses tabling. Our work differs from this in two important 
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ways. The first is that we use directly XSB tabhng with answer subsumption while 
([Mantadelis and Janssens 2010p use some user-defined predicates that manipulate 
extra tabling data structures. The second difference is that in (jMantadelis and Janssens 2010p 
explanations are stored in trie data structures that are then translated into BDDs. 
When translating the tries into BDDs, the algorithm of (jMantadelis and Janssens 2010p 
finds shared substructures, i.e., sub-explanations shared by many explanations. 
By identifying shared structures the construction of BDDs is sped up since sub- 
explanations are transformed into BDD only once. In our approach, we similarly 
exploit the repetition of structures but we do it while finding explanations: by stor- 
ing in the table the BDD representation of the explanations of each answer, every 
time the answer is reused its BDD does not have to be rebuilt. Thus our optimiza- 
tion is guided by the derivation of the query. Moreover, if a BDD is combined with 
another BDD that already contains the first as a subgraph, we rely on the highly 
optmized CUDD functions for the identification of the repetition and the simplifi- 
cation of the combining operation. In this way we exploit structure sharing as well 
without the intermediate pass over the trie data strucutres. 

10 Experiments 

PITA was tested on two datasets that contain function symbols: the first is taken 
from (IVennekens et al. 2004^ and encodes a Hidden Markov Model (HMM) while 
the second from (IDe Raedt et al. 2007^ encodes biological networks. Moreover, it 
was also tested on the four testbeds of (jMeert et al. 2009]) that do not contain func- 
tion symbols. PITA was compared with the exact version of ProbLog (|De Raedt et al. 2007|) 
available in the git version of Yap as of 10 November 2010, with the version of 
cplint dRiguzzi 20"07| available in Yap 6.0 and with the version of CVE (IMeert et al. 2009^ 
available in ACE-ilProlog 1.2. 2C0. 

The first problem models a hidden Markov model with states 1, 2 and 3, of which 
3 is an end state. This problem is encoded by the program 

s(0,l):l/3 V s(0,2):l/3 V s(0,3):l/3. 
s(T,l):l/3 V s(T,2):l/3 V s(T,3):l/3 ^ 

Tl is T-1, T1>=0, s(Tl,F), \ + s(Tl,3). 

For this experiment, we query the probability of the HMM being in state 1 at 
time for increasing values of iV, i.e., we query the probability of s(N,l). In PITA 
and ProbLog, we did not use reordering of BDDs variable^. In PITA we tabled 
on/2 and in ProbLog we tabled the same predicate using the technique described 
in (jMantadelis and Janssens 2010p . The execution times of PITA, ProbLog, CVE 
and cplint are shown in Figure [2] In this problem tabling provides an impressive 
speedup, since computations can be reused often. 

All experiments were performed on Linux machines with an Intel Core 2 Duo E6550 (2333 MHz) 
processor and 4 GB of RAM. 

For each experiment with PITA and ProbLog, wo used cither group sift automatic reordering 
or no reordering of BDDs variables depending on which gave the best results. 




Fig. 2. Hidden Markov model. 



The biological network programs compute the probability of a path in a large 
graph in which the nodes encode biological entities and the links represents concep- 
tual relations among them. Each program in this dataset contains a non-probabilistic 
definition of path plus a number of links represented by probabilistic facts. The 
programs have been sampled from a very large graph and contain 200, 400, . . ., 
10000 edges. Sampling was repeated ten times, to obtain ten scries of programs 
of increasing size. In each program we query the probability that the two genes 
HGNC_620 and HGNC_983 are related. We used two definitions of path. The first, 
from ( [Kimmig et al. 201 performs loop checking explicitly by keeping the list of 
visited nodes: 



path{X, Y) ^ path{X, Y, [X] , Z) . 

pathlx,Y,V,[Y\V]) ^ arc{X,Y). 
path{X,Y,VO,Vl) ^ arc{X,Z),appendiVO,.S,Vl), 

\ + member{Z,VO),path{Z,Y,[Z\VO],Vl). ^ ' 
arc{X,Y) ^ edge{X,Y). 

arc{X,Y) ^ edge{Y,X). 

The second exploits tabling for performing loop checking: 

path{X,X). 

path{X,Y,) ^ path{X,Z),arc{Z,Y). 

arc{X,Y) ^ edge{X,Y). ^ ^ 

arc{X,Y) ^ edge{Y,X). 

The possibility of using lists (that require function symbols) allowed in this case 
more modeling freedom. In PITA, the predicates path/2, edge/2 and arc/2 are 
tabled in both cases. For ProbLog we used its implementation of tabling for loop 
checking in the second program. As in PITA, path/2, edge/2 and arc/2 are tabled. 

We ran PITA, ProbLog and cplint on the graphs starting from the smallest 
program. In each series we stopped after one day or at the first graph for which the 
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(a) Number of successes. 

Fig. 3. Biological graph experiments. 
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(b) Average execution times on the graphs on 
which all the algorithm succeeded. 
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Fig. 4. Average exection times on the biological graph experiments. 



program ended for lack of memorjo. In cplint, PITA and ProbLog we used group 



sift reordering of BDDs variables. Figure 3(a) shows the number of subgraphs for 
which each algoritlim was able to answer the query as a function of the size of the 
subgraphs, while Figure [3 (b) | shows the execution time averaged over all and only 
the subgraphs for which all the algorithms succeeded. Figure H] alternately shows 
the execution times averaged, for each algorithm, over all the graphs on which the 



CVE was not applied to this dataset because the current version can not handle graph cycles. 
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Fig. 5. Datasets from (Meert et al. 2009). 
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algorithm succeeded. In these Figures PITA and PITAt refers to PITA apphed to 
path programs ([3]) and (H)) respectively and similarly for ProbLog and ProbLogt. 

PITA applied to program ([3]) was able to solve more subgraphs and in a shorter 
time than cplint and all cases of ProbLog. On path definition Q, on the other 
hand, ProbLogt was able to solve a larger number of problems than PITAt and in 
a shorter time. For PITA the vast majority of time for larger graphs was spent on 
BDD maintenance. This shows that, even if tabling consumes more memory when 
finding the explanations, BDDs are built faster and use less memory, probably 
due to the fact that tabling allows less redundancy (only one BDD is stored for 
an answer) and supports a bottom-up construction of the BDDs, which is usually 
better. 

The four datasets of (jMeert et al. 2009)) . served as a final suite of benchmarks, 
bloodtype encodes the genetic inheritance of blood type, growingbody contains 
programs with growing bodies, growinghead contains programs with growing heads 
and uwcse encodes a university domain. The best results for ProbLog were obtained 
by using ProbLog's tabling in all experiments except growinghead. The execution 
times of cplint, ProbLog, CVE and PITA are shown in Figures 5(a) and |5(b)"| 
6(a) and In the legend PITA means that dynamic BDD variable reordering 



was disabled, while PITAdr has group sift automatic reordering enabled. Similarly 
for ProbLog and ProbLogdr. 

In bloodtype, growingbody and growinghead PITA without variable reordering 
was the fastest, while in uwcse PITA with group sift automatic reordering was the 
fastest. These results show that variable reordering has a strong impact on per- 
formances: if the variable order that is obtained as a consequence of the sequence 
of BDD operations is already good, automatic reordering severely hinders perfor- 
mances. Fully understanding the effect of variable reordering on performances is 
subject of future work. 



For the missing points at the beginning of the lines a time smaller than 10 ^ was recorded. For 
the missing points at the end of the lines the algorithm exhausted the available memory. 
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11 Conclusion and Future Works 

This paper has made two main contributions. The first is the identification of 
bounded term-size programs and queries as conditions for tlie distribution semantics 
to be well-defined when LPADs contain function symbols. As shown in Section [421 
bounded-term-size programs and queries sometimes include programs that other 
termination classes do not. Given the transformational equivalence of LPADs and 
other probabilistic logic programming formalisms that use the distribution seman- 
tics, these results may form a basis for determining well-definedness beyond LPADs. 

As a second contribution, the PITA transformation provides a practical reasoning 
algorithm that was directly used in the experiments of Section [TUl The experiments 
substantiate the PITA approach. Accordingly, PITA should be easily portable to 
other tabling engines such as that of YAP, Ciao and B Prolog if they support an- 
swer subsumption over general semi-lattices. PITA is available in XSB Version 3.3 
and later, downloadable from http : //xsb . sourcef orge - net. A user manual is in- 



cluded in XSB manual and can also be found at http : //sites .unif e . it/ml/pita 

In the future, we plan to extend PITA to the whole class of sound LPADs by 
implementing the SLG DELAYING and simplification operations for answer sub- 
sumption; an implementation of tabling with term-depth abstraction (Section [6]) is 
also underway. Finally, we are developing a version of PITA that is able to answer 
queries in an approximate way, similarly to ( [Kimmig et al. 2011D . 
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Appendix A Proof of Well-Definedness Theorems (Section 4.1) 

To prove Theorem 1 we start with a lemma that states one half of the equivalence, 
and also describes an implication of the bounded term-size property for computa- 
tion. 

Lemma 2 

Let P be a normal program with the bounded term-size property. Then 

1. Any atom in WFM{P) has a finite stratum, and was computed by a finite 
number of applications of True^ . 

2. There are a finite number of true atoms in WFM{P). 

Proof 

For 2), note that bounding the size of 9 as used in Definition 2 bounds the size 
of the ground clause B <— Li,...,L„, and so bounds the size of Truef{Tr) for 
any I,Tr C "Hp. Since the true atoms in WFM{P) are defined as a fixed-point of 
Truef for a given /, there must be a finite number of them. 

Similarly, since the size of 6 is bounded by an integer L, and since Truef is mono- 
tonic for any / Truef {$) reaches its fixed point in a finite number of applications, 
and in fact only a finite number of applications of Truef are required to compute 
true atoms in WFM{P). In addition, it can be the case that Tf ^ / only a finite 
number of times, so that WFM{P) can contain only a finite number of strata. □ 

Theorem 1 

Let P be a normal program. Then WFM (P) has a finite number of true atoms iff 
P has the bounded term-size property. 

Proof 

The 4= implication was shown by the previous Lemma, so that it remains to prove 
that if WFM{P) has a finite number of true atoms, then P has the bounded term- 
size property. To show this, since the number of true atoms in WFM(P) is finite, 
all derivations of true atoms using Truef (Tr) of Definition 2 can be constructed 
using only a finite set of ground clauses. For this to be possible, the maximum term 
size of any literal in any such clause is finitely bounded, so that P has the bounded 
term-size property. □ 

Theorem 2 

Let T be a sound bounded term-size LPAD, and let A G Ht. Then A has a finite 
set of finite explanations that is covering. 
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Proof 

Let T be an LPAD and w be a world of T. Each clause C'ground in w is associated 
with a choice {C,9,i), for which C and i can both be taken as finite integers. We 
term {C,9,i) the generators of Cground- By Theorem 1 each world w of T has a 
finite number of true atoms, and a maximum size of any atom in such a world. 
We prove that the maximum Lt of all such worlds has a finite upper bound. 

We first consider the case in which T does not contain negation. Consider a 
world w whose well-founded model has the finite bound L^, on the size of the 
largest atoms. We show that can not be arbitrarily large. 

Since is finite, all facts in T must be ground and all clauses range-restricted: 
otherwise some possible world of T would contain an infinite number of true atoms 
and so would not be bounded term-size by Theorem 1. There must be some set G 
of generators which acts on a chain of interpretations /o C /i C /„ C WFM{w), 
where Iq is some superset of the facts in w, and the maximum size of any atom 
in li is strictly increasing. Because WFM{w) is finite and T is definite, the set of 
generators G must be finite. 

We first show that G must contain generators {G',d,i) and {C',9',j) for at least 
one disjunctive clause C . If not, then either 1) L^, would be infinite as there would 
be some recursion in which term size increases indefinitely; or 2) if there is no such 
recursion that indefinitely increases the size of terms and no disjunctive clauses, 
could not be arbitrarily large and this would prove the property. In fact, without 
disjunction the set of clauses causing the recursion would produce an infinite model. 
With disjunction, eventually a different head is chosen and the recursion is stopped. 

Consider then, for some set D of disjunctive clauses, the set Dexpand of generators 
must be used to derive (perhaps indirectly) atoms whose size is strictly greater than 
the maximal size of an atom in /„, while another set of generators Dstop must be 
used to stop the production of larger atoms, since WFM{w) is finite. However, 
if such a situation were the case, there must also be a world Winf in which for 
ground clauses for D whose grounding substitution is over a certain size, only the 
set Dexpand of generators is chosen and D^top is never chosen. The well-founded 
model for Wmf would then be infinite, against the hypothesis that T is bounded 
term-size. 

The preceding argument has shown that since there is an overall bound on the 
size of the largest atom in any world for T, T has a finite number of different 
models, each of which is finite. As each model is finite, there is a finite number of 
ground clauses that determine each model by deriving the positive atoms in the 
model. Each such clause is associated with an atomic choice, and the set of these 
clauses corresponds to a finite composite choice. The set of these composite choices 
corresponding to models in which the query A is true represent a finite set of finite 
explanations that is covering for A. 

Although the preceding paragraph assumed that T did not contain negation, the 
assumption was made only for simplicity, so that details of strata need not be con- 
sidered. The argument for normal programs is essentially the same, constituting an 
induction where the above argument is made for each stratum. Because Definition[2] 
specifies that an atom can be added to an interpretation only once, there can only 
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be a finite number of strata in which some true atom is added, so that there wiU be 
only a finite number of strata overaU. Since there are only a finite number of strata, 
each of which has a finite number of applications of Truef [Tr) , a finite bound L 
can be constructed so that T fulfills the definition of bounded term-size. □ 

Appendix B Proof of the Termination Theorem for Tabling (Section 6) 

Theorem 3 

Let P be fixed-order dynamically stratified normal program, and Q a bounded 
term-size query to P. Then there is an SLG evaluation of Q to P using term-depth 
abstraction that finitely terminates. 

Proof 

SLG has been proven to terminate for other notions of bounded term-size queries, 
so here we only sketch the termination proof. 

First, we note that ( [Sagonas et al. 2000[ ) guarantees that if P is a fixed-order 
stratified program, then there is an an SLG evaluation £^ of P that does not re- 
quire the use of the SLG Delaying, Simplification or Answer Completion 
operations, and by implication no forest of £ contains a conditional answer. Such 
an evaluation is termed delay-minimal. Note that Definition |4] constrains only the 
bindings used in Truef , and these constraints may not apply ground atoms that 
are undefined in the WFM{P). As a result, condition answers, if they are not sim- 
plified or removed by Simplification or Answer Completion may not have a 
bounded term-size. This situation is avoided by delay-minimal evaluations. Next, 
we assume that all negative selected literals are ground. This assumption causes 
no loss of generality as the evaluation will flounder and so terminate finitely if 
a non-ground negative literal is selected. Given this context, the proof uses the 
forest-of-trees model (jSwift 1999aP of SLG (jChen and Warren 1996|) . 

• We consider as an induction basis the case when Q is in stratum - that is, when 
Q can be derived without clauses that contain negative literals, or is part of an 
unfounded set S of atoms and clauses for atoms in S do not contain negative 
literals. As argued in Section 6, the use of term-depth abstraction ensures that an 
SLG evaluation f of a query Q to a program with bounded term-size has only a finite 
number of trees. In addition, since SLG works on the original clauses of a program 
P and P is finite, (although ground{P) may not be), there can be only a finite 
number of clauses resolvable against the root of any tree via Program Clause 
Resolution, and so the root of each SLG tree can contain only a finite number of 
children. Finally, to show that each interior node has a finite number of children, 
we consider that there can only be a finite number of answers to any subgoal upon 
which Q depends. This follows from the fact that £ is delay-minimal and so produces 
no conditional answers, together with the the bound of Definition 4 that ensures a 
program is bounded term-size. As a result, there are only a finite number of nodes 
that are produced through Answer Return. These observations together ensure 
that each tree in any SLG forest of £ is finite. Since each operation (including the 
SLG Completion operation, which does not add nodes to a forest) is applicable 
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only one time to a given node or set of nodes in an evaluation (i.e. executing an 
SLG operation removes the conditions for its applicability) the evaluation £ itself 
must be finite and statement holds for the induction basis. 
• For the induction step, we assume the statement holds for queries whose (fixed- 
order) dynamic strata is less than N to show that the statement will hold for a query 
Q at stratum N as well. As indicated above, we use a delay-minimal SLG evaluation 
£ that docs not require Delaying, Simplification or Answer Completion 
operations. For the induction case, the various SLG operations that do not include 
negation will only produce a finite number of trees and a finite number of nodes 
in each tree as described in the induction basis. However if there is a node A'^ in a 
forest with a selected negative literal -^A, the SLG operation Negation Return 
is applicable. In this case, a single child will be produced for N and no further 
operations will be applicable to N. Thus any forest in £ will have a finite number 
of finite trees, and since all operations can be applied once to each node, as before 
£ will be finite, so that the statement holds by induction. 

□ 

Appendix C Proof of the Correctness Theorems for PITA (Section 8) 

The next theorem addresses the correctness of the PITA evaluation. As discussed in 
Section 8, the BDDs of the PITA transformation are represented as ground terms, 
while BDD operations, such as and/3, or/3 etc. are infinite relations on such terms. 
The PITA transformation also uses the predicate get_var_n/4 whose definition in 
Section 7 is: 

getjuar_n{R, S, Probs, Var) <r- 
{var{R, S, Var) — >■ true; 

length{Probs, L), addjvar{L, Probs, Var), assert{var{R, S, Var))). 

This definition uses a non-logical update of the program, and so without modifica- 
tions, it is not suitable for our proofs below. Alternately, we assume that ground{T) 
is augmented with a (potentially infinite) number of facts of the form var{R, [], Var) 
for each ground rule R (note that no variable instantiation is needed in the second 
argument of var/ 3 if it is indexed on ground rule names). Clearly, the augmenta- 
tion of T by such facts has the same meaning as get_var_n/4, but is simply done 
by an a priori program extension rather than during the computation as in the 
implementation. 
Lemma 1 

Let T be an LPAD and Q a bounded term-size query to T. Then the query 
PITAh{Q) to PITA{T) has bounded term-size. 

Proof 

Although Tq (Definition 6) has bounded term-size, we also need to ensure that 
PITA{Tq) has bounded term-size, given the addition of the BDD relations and/3, 
or/3, etc. along with the var/ 3 relations mentioned above. 
Both var/3 and the BDD relations are functional on their input arguments (i.e. 
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the first two arguments of var/3, and/3, or/3, etc. (of. Section 7). Tlierefore, for 
the body of a clause C that was true in an appHcation of True^'^ there are exactly 
n bodies that are true in an application of True^^'^^'^'^'^\ where n is the number 
of heads of C . Thus the size of every ground substitutions in every iteration of 
True^^'^^^'^'^^ is bounded as well. 

Note that since PITA(T) and PIT Ah (Q) arc both syntactic transformations, 
the theorem applies even if the LPAD isn't sound. □ 

Theorem 4 

Let T be a fixed-order dynamically stratified LPAD and Q a ground bounded 
term-size atomic query. Then there is an SLG evaluation £ of PIT Ah (Q) against 
PITA{Tq), such that answer subsumption is declared on PITAh{Q) using BDD- 
disjunction where £ finitely terminates with an answer Ans for PIT Ah (Q) and 
BDD{Ans) represents a covering set of explanations for Q. 

Proof 

(Sketch) The proof uses the forest-of-trccs model (jSwift 1999aP of SLG (jChen and Warren 1996^ . 

Because T is fixed-order dynamically stratified, queries to T can be evaluated 
using SLG without the delaying, simplification or answer completion op- 
erations. Instead, as ( [Sagonas et al. 2000[ ) shows, only the SLG operations new 
subgoal, program clause resolution, answer return and negative re- 
turn are needed. Since T is fixed-order dynamically stratified, it is immediate from 
inspecting the transformations of Section [7] together with the fact that the BDD 
relations are functional that PITA{T) is also fixed-order dynamically stratified as 
is PITA{T)q. 

However, Theorem 3 must be extended to evaluations that include answer sub- 
sumption, which we capture with a new operation Answer Join to perform answer 
subsumption over an upper semi-lattice L. Without loss of generality we assume 
that a given predicate of arity m > has had answer subsumption declared on its 
m*'' argument and we term the first m — 1 arguments non-subsuming arguments. 
We recall that a node N is an answer in an SLG tree T if has no unresolved 
goals and is a leaf in T. Accordingly, creating a child of N with a special marker 
fail is a method to effectively delete an answer (cf. (jSwift 1999a| ). 

• Answer Join: Let an SLG forest J>i contain an answer node 

N = Ans ^ 

where the predicate for Ans has been declared to use answer subsumption over 
a lattice L for which the join operation is decidable, and let the arity of Ans be 
m > 0. Further, let A be the set of all answers in J>j that are in the same tree, T/v, 
as N and for which the non-subsuming arguments are the same as Ans. Let Join 
be the L-join of all the final arguments of all answers in A. 

— If (Ans ^){arg{m, Ans) / Join} is not an answer in Tjv, add it as a child of 
A^, and add the child fail to all other answers in A. 

— Otherwise, if {Ans <^){arg{m, Ans) / Join} is answer in Tn, create a child 
fail for N. 
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For the proof, the first item to note is that since Tq is bounded term-size, any clauses 
on which Q depends that give rise to true atoms in the well-founded model of any 
world of T must be be range-restricted ~ otherwise since T has function symbols, 
Tq would have an infinite model and not be bounded term-size. Given this, it is 
then straightforward to show that PITA{T)q is also range-restricted and that any 
answer A of PITAh{Q) will be ground (cf. ( [Muggleton 2000[ )). Accordingly, the 
operation Answer Join will be applicable to any subgoal with a non-empty set of 
answers. 

We extend Theorem 3 and Lemma 1 to show that since PITA{T)q has the 
bounded term-size property, a SLG evaluation of a query PIT Ah (Q) to PITA{T)q 
will terminate. Because the join operation for L is decidable, computation of the 
join will not affect termination properties. Let Tjv be a tree whose root subgoal 
is a predicate that uses answer subsumption. Then each time a new answer node 
is added to Tm there will be one new Answer Join operation that becomes 
applicable for A^. Let Ahe & set of answers in Tjv as in the definition of Answer 
Join. Then applying the Answer Join operation will either 1) create a child of 
A^ that is a new answer and "delete" \A\ answers by creating children for them of 
the form fail; or 2) "delete" the answer A^ by creating a child fail of A^. Clearly 
any answer can be deleted at most once, and each application of the Answer Join 
operation will delete at least one answer in T/v. Accordingly, if Tjv contains Num 
answers, there can be at most Num applications of Answer Join for answers 
in T/v- Using these considerations it is straightforward to show that termination 
of bounded term-size programs holds for SLG evaluations extended with answer 
subsumption 0- 

Thus, the bounded term-size property of PITA{T)q together with Theorem 2 
imply that there will be a finite set of finite explanations for PIT Ah {Q), and the 
preceding argument shows that SLG extended with Answer Join will terminate 
on the query PIT Ah (Q)- It remains to show that an answer Ans for PIT Ah (Q) in 
the final state oi£ is such that BDD{Ans) represents a covering set of explanations 
for Q. That BDD{Ans) contains a covering set of explanations can be shown by 
induction on the number of BDD operations. For the induction basis it is easy to 
see that the operations zero/1 and one/1 are covering for false and true atoms 
respectively. 

• Consider an "and" operation in the body of a clause. For the inductive assumption, 
BBi-i and Bi both represent finite set of explanations covering for Li, . . . , L^-i and 
Li respectively. Let -Fi_i, F/, and Fi be the formulas expressed by BBi^i, Bi, and 
BBi respectively. These formulas can be represented in disjunctive normal form. 

As an aside, note that due to the fact that Answer Join deletes all answers in A except 
the join, it can be shown by induction that immediately after an Answer .Join operation is 
applied to Ans in a tree Tjv, there will be only one "non-deleted" answer in Tjv with the same 
non-subsuming bindings as Ans. Accordingly, if the cost of computing the join is constant, 
the total cost of Num Answer .Join operations will be Num. Based on this observation, the 
implementation of PITA can be thought of as applying an Answer, Join operation immediately 
after a new answer is derived in order to avoid returning answers that arc not optimal given 
the current state of the computation. 
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in which every disjunct represents an explanation. Fi is obtained by multiplying 
Fi-i and so, by algebraic manipulation, we can obtain a formula in disjunctive 
normal form in which every disjunct is the conjunction of two disjuncts, one from 
Fi-i and one from F^. Every disjunct is thus an explanation for the body prefix 
up to and including Li. Moreover, every disjunct for Fi is obtained by conjoining a 
disjunct for with a disjunct for F^. 

• In the case of a "not" operation in the body of a clause, let Li be the negative 
literal ^D. Then for BNi the BDD produced by D, not{BNi, Bi) simply negates 
this BDD to produce a covering set of explanations for ^D. 

• In the case of an "or" operation between two answers, the resulting BDD will 
represents the union of the set of explanations represented by the BDDs that are 
joined. 

Since the property holds both for the induction basis and the induction step, the 
set of explanations represented by BDD[Ans) is covering for the query. □ 



